A Kernel Statistical Test of Independence

نویسندگان

  • Arthur Gretton
  • Kenji Fukumizu
  • Choon Hui Teo
  • Le Song
  • Bernhard Schölkopf
  • Alexander J. Smola
چکیده

Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m), wherem is the sample size. We demonstrate that this test outperforms established contingency table and functional correlation-based tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Avoiding non-independence in fMRI data analysis: Leave one subject out

Concerns regarding certain fMRI data analysis practices have recently evoked lively debate. The principal concern regards the issue of non-independence, in which an initial statistical test is followed by further non-independent statistical tests. In this report, we propose a simple, practical solution to reduce bias in secondary tests due to non-independence using a leave-one-subject-out (LOSO...

متن کامل

Kernel Canonical Correlation Analysis and its Applications to Nonlinear Measures of Association and Test of Independence∗

Measures of association between two sets of random variables have long been of interest to statisticians. The classical canonical correlation analysis can characterize, but also be limited to, linear association. In this article we study some nonlinear association measures using the kernel method. The introduction of kernel methods from machine learning community has a great impact on statistic...

متن کامل

On Kernel Parameter Selection in Hilbert-Schmidt Independence Criterion

The Hilbert-Schmidt independence criterion (HSIC) is a kernel-based statistical independence measure that can be computed very efficiently. However, it requires us to determine the kernel parameters heuristically because no objective model selection method is available. Least-squares mutual information (LSMI) is another statistical independence measure that is based on direct density-ratio esti...

متن کامل

A Kernel Conditional Independence Test for Relational Data

Conditional independence (CI) tests play a central role in statistical inference, machine learning, and causal discovery. Most existing CI tests assume that the samples are independently and identically distributed (i.i.d.). However, this assumption often does not hold in the case of relational data. We define Relational Conditional Independence (RCI), a generalization of CI to the relational s...

متن کامل

An Adaptive Test of Independence with Analytic Kernel Embeddings

A new computationally efficient dependence measure, and an adaptive statistical test of independence, are proposed. The dependence measure is the difference between analytic embeddings of the joint distribution and the product of the marginals, evaluated at a finite set of locations (features). These features are chosen so as to maximize a lower bound on the test power, resulting in a test that...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007